83 research outputs found
Balancing the Communication Load of Asynchronously Parallelized Machine Learning Algorithms
Stochastic Gradient Descent (SGD) is the standard numerical method used to
solve the core optimization problem for the vast majority of machine learning
(ML) algorithms. In the context of large scale learning, as utilized by many
Big Data applications, efficient parallelization of SGD is in the focus of
active research. Recently, we were able to show that the asynchronous
communication paradigm can be applied to achieve a fast and scalable
parallelization of SGD. Asynchronous Stochastic Gradient Descent (ASGD)
outperforms other, mostly MapReduce based, parallel algorithms solving large
scale machine learning problems. In this paper, we investigate the impact of
asynchronous communication frequency and message size on the performance of
ASGD applied to large scale ML on HTC cluster and cloud environments. We
introduce a novel algorithm for the automatic balancing of the asynchronous
communication load, which allows to adapt ASGD to changing network bandwidths
and latencies.Comment: arXiv admin note: substantial text overlap with arXiv:1505.0495
Asynchronous Parallel Stochastic Gradient Descent - A Numeric Core for Scalable Distributed Machine Learning Algorithms
The implementation of a vast majority of machine learning (ML) algorithms
boils down to solving a numerical optimization problem. In this context,
Stochastic Gradient Descent (SGD) methods have long proven to provide good
results, both in terms of convergence and accuracy. Recently, several
parallelization approaches have been proposed in order to scale SGD to solve
very large ML problems. At their core, most of these approaches are following a
map-reduce scheme. This paper presents a novel parallel updating algorithm for
SGD, which utilizes the asynchronous single-sided communication paradigm.
Compared to existing methods, Asynchronous Parallel Stochastic Gradient Descent
(ASGD) provides faster (or at least equal) convergence, close to linear scaling
and stable accuracy
On Invariance, Equivariance, Correlation and Convolution of Spherical Harmonic Representations for Scalar and Vectorial Data
The mathematical representations of data in the Spherical Harmonic (SH)
domain has recently regained increasing interest in the machine learning
community. This technical report gives an in-depth introduction to the
theoretical foundation and practical implementation of SH representations,
summarizing works on rotation invariant and equivariant features, as well as
convolutions and exact correlations of signals on spheres. In extension, these
methods are then generalized from scalar SH representations to Vectorial
Harmonics (VH), providing the same capabilities for 3d vector fields on spheresComment: 106 pages, tech repor
Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions
Generative convolutional deep neural networks, e.g. popular GAN
architectures, are relying on convolution based up-sampling methods to produce
non-scalar outputs like images or video sequences. In this paper, we show that
common up-sampling methods, i.e. known as up-convolution or transposed
convolution, are causing the inability of such models to reproduce spectral
distributions of natural training data correctly. This effect is independent of
the underlying architecture and we show that it can be used to easily detect
generated data like deepfakes with up to 100% accuracy on public benchmarks.
To overcome this drawback of current generative models, we propose to add a
novel spectral regularization term to the training optimization objective. We
show that this approach not only allows to train spectral consistent GANs that
are avoiding high frequency errors. Also, we show that a correct approximation
of the frequency spectrum has positive effects on the training stability and
output quality of generative networks
Unsupervised Multiple Person Tracking using AutoEncoder-Based Lifted Multicuts
Multiple Object Tracking (MOT) is a long-standing task in computer vision.
Current approaches based on the tracking by detection paradigm either require
some sort of domain knowledge or supervision to associate data correctly into
tracks. In this work, we present an unsupervised multiple object tracking
approach based on visual features and minimum cost lifted multicuts. Our method
is based on straight-forward spatio-temporal cues that can be extracted from
neighboring frames in an image sequences without superivison. Clustering based
on these cues enables us to learn the required appearance invariances for the
tracking task at hand and train an autoencoder to generate suitable latent
representation. Thus, the resulting latent representations can serve as robust
appearance cues for tracking even over large temporal distances where no
reliable spatio-temporal features could be extracted. We show that, despite
being trained without using the provided annotations, our model provides
competitive results on the challenging MOT Benchmark for pedestrian tracking
Using GPI-2 for Distributed Memory Paralleliziation of the Caffe Toolbox to Speed up Deep Neural Network Training
Deep Neural Network (DNN) are currently of great inter- est in research and
application. The training of these net- works is a compute intensive and time
consuming task. To reduce training times to a bearable amount at reasonable
cost we extend the popular Caffe toolbox for DNN with an efficient distributed
memory communication pattern. To achieve good scalability we emphasize the
overlap of computation and communication and prefer fine granu- lar
synchronization patterns over global barriers. To im- plement these
communication patterns we rely on the the Global address space Programming
Interface version 2 (GPI-2) communication library. This interface provides a
light-weight set of asynchronous one-sided communica- tion primitives
supplemented by non-blocking fine gran- ular data synchronization mechanisms.
Therefore, Caf- feGPI is the name of our parallel version of Caffe. First
benchmarks demonstrate better scaling behavior com- pared with other
extensions, e.g., the Intel TM Caffe. Even within a single symmetric
multiprocessing machine with four graphics processing units, the CaffeGPI
scales bet- ter than the standard Caffe toolbox. These first results
demonstrate that the use of standard High Performance Computing (HPC) hardware
is a valid cost saving ap- proach to train large DDNs. I/O is an other
bottleneck to work with DDNs in a standard parallel HPC setting, which we will
consider in more detail in a forthcoming paper
On the Interplay of Convolutional Padding and Adversarial Robustness
It is common practice to apply padding prior to convolution operations to
preserve the resolution of feature-maps in Convolutional Neural Networks (CNN).
While many alternatives exist, this is often achieved by adding a border of
zeros around the inputs. In this work, we show that adversarial attacks often
result in perturbation anomalies at the image boundaries, which are the areas
where padding is used. Consequently, we aim to provide an analysis of the
interplay between padding and adversarial attacks and seek an answer to the
question of how different padding modes (or their absence) affect adversarial
robustness in various scenarios.Comment: Accepted as full paper at ICCV-W 2023 BRAV
Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness?
Recently, RobustBench (Croce et al. 2020) has become a widely recognized
benchmark for the adversarial robustness of image classification networks. In
its most commonly reported sub-task, RobustBench evaluates and ranks the
adversarial robustness of trained neural networks on CIFAR10 under AutoAttack
(Croce and Hein 2020b) with l-inf perturbations limited to eps = 8/255. With
leading scores of the currently best performing models of around 60% of the
baseline, it is fair to characterize this benchmark to be quite challenging.
Despite its general acceptance in recent literature, we aim to foster
discussion about the suitability of RobustBench as a key indicator for
robustness which could be generalized to practical applications. Our line of
argumentation against this is two-fold and supported by excessive experiments
presented in this paper: We argue that I) the alternation of data by AutoAttack
with l-inf, eps = 8/255 is unrealistically strong, resulting in close to
perfect detection rates of adversarial samples even by simple detection
algorithms and human observers. We also show that other attack methods are much
harder to detect while achieving similar success rates. II) That results on
low-resolution data sets like CIFAR10 do not generalize well to higher
resolution images as gradient-based attacks appear to become even more
detectable with increasing resolutions.Comment: AAAI-22 AdvML Workshop ShortPape
FrequencyLowCut Pooling -- Plug & Play against Catastrophic Overfitting
Over the last years, Convolutional Neural Networks (CNNs) have been the
dominating neural architecture in a wide range of computer vision tasks. From
an image and signal processing point of view, this success might be a bit
surprising as the inherent spatial pyramid design of most CNNs is apparently
violating basic signal processing laws, i.e. Sampling Theorem in their
down-sampling operations. However, since poor sampling appeared not to affect
model accuracy, this issue has been broadly neglected until model robustness
started to receive more attention. Recent work [17] in the context of
adversarial attacks and distribution shifts, showed after all, that there is a
strong correlation between the vulnerability of CNNs and aliasing artifacts
induced by poor down-sampling operations. This paper builds on these findings
and introduces an aliasing free down-sampling operation which can easily be
plugged into any CNN architecture: FrequencyLowCut pooling. Our experiments
show, that in combination with simple and fast FGSM adversarial training, our
hyper-parameter free operator significantly improves model robustness and
avoids catastrophic overfitting
Don't Look into the Sun: Adversarial Solarization Attacks on Image Classifiers
Assessing the robustness of deep neural networks against out-of-distribution
inputs is crucial, especially in safety-critical domains like autonomous
driving, but also in safety systems where malicious actors can digitally alter
inputs to circumvent safety guards. However, designing effective
out-of-distribution tests that encompass all possible scenarios while
preserving accurate label information is a challenging task. Existing
methodologies often entail a compromise between variety and constraint levels
for attacks and sometimes even both. In a first step towards a more holistic
robustness evaluation of image classification models, we introduce an attack
method based on image solarization that is conceptually straightforward yet
avoids jeopardizing the global structure of natural images independent of the
intensity. Through comprehensive evaluations of multiple ImageNet models, we
demonstrate the attack's capacity to degrade accuracy significantly, provided
it is not integrated into the training augmentations. Interestingly, even then,
no full immunity to accuracy deterioration is achieved. In other settings, the
attack can often be simplified into a black-box attack with model-independent
parameters. Defenses against other corruptions do not consistently extend to be
effective against our specific attack.
Project website: https://github.com/paulgavrikov/adversarial_solarizatio
- …